{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<!--BOOK_INFORMATION-->\n",
    "<a href=\"https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv\" target=\"_blank\"><img align=\"left\" src=\"data/cover.jpg\" style=\"width: 76px; height: 100px; background: white; padding: 1px; border: 1px solid black; margin-right:10px;\"></a>\n",
    "*This notebook contains an excerpt from the book [Machine Learning for OpenCV](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv) by Michael Beyeler.\n",
    "The code is released under the [MIT license](https://opensource.org/licenses/MIT),\n",
    "and is available on [GitHub](https://github.com/mbeyeler/opencv-machine-learning).*\n",
    "\n",
    "*Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations.\n",
    "If you find this content useful, please consider supporting the work by\n",
    "[buying the book](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv)!*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Visualizing Data from an External Dataset](02.04-Visualizing-Data-from-an-External-Dataset.ipynb) | [Contents](../README.md) | [First Steps in Supervised Learning](03.00-First-Steps-in-Supervised-Learning.ipynb) >"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "# Dealing with Data Using OpenCV's TrainData Container in C++\n",
    "\n",
    "For the sake of completeness, and for those who insist on using the C++ API of OpenCV, let's do a quick detour on OpenCV's <tt>TrainData</tt> container that allows us to load numerical data from .csv files.\n",
    "\n",
    "Among other things, in C++ the `ml` module contains a class called `TrainData`, which\n",
    "provides a container to work with data in C++. Its functionality is limited to reading\n",
    "(preferably) numerical data from .csv files (containing comma-separated values). Hence, if\n",
    "the data that you want to work with comes in a neatly organized `.csv` file, this class will\n",
    "save you a lot of time. If your data comes from a different source, I'm afraid your best\n",
    "option might be to create a `.csv` file by hand, using a suitable program such as Open Office\n",
    "or Microsoft Excel."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "If you have some nice all-float data that lives in a comma-separated file, you could load it like so:\n",
    "\n",
    "    Ptr<TrainData> tDataContainer = TrainData::loadFromCSV(\"file.csv\",\n",
    "                                        0,  // number of lines to skip\n",
    "                                        0); // index of 1st and only output var"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Shuffling data is easy:\n",
    "\n",
    "    tDataContainer->shuffleTrainTest();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "And so is splitting data into training and test sets. For example, if your file contains 100 samples, you could assign the first 90 samples to the training set, and leave the remaining 10 to the test set:\n",
    "\n",
    "    tDataContainer->setTrainTestSplit(90);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "You can also load all training/test samples and store them in an OpenCV matrix:\n",
    "\n",
    "    cv::Mat trainData = tDataContainer->getTrainSamples();\n",
    "    cv::Mat testData = tDataContainer->getTestSamples();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Most OpenCV classifiers from the <tt>ml</tt> module then directly take training data as an input. Let's say you have a statistical model saved in the variable <tt>model</tt>:\n",
    "\n",
    "    model->train(tDataContainer->getTrainSamples());"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "You can find all relevant functions described here: http://docs.opencv.org/3.1.0/dc/d32/classcv_1_1ml_1_1TrainData.html.\n",
    "\n",
    "Other than that, I'm afraid the `TrainData` container and its use cases might be a bit\n",
    "antiquated. So for the rest of the book, we will focus on Python instead."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Visualizing Data from an External Dataset](02.04-Visualizing-Data-from-an-External-Dataset.ipynb) | [Contents](../README.md) | [First Steps in Supervised Learning](03.00-First-Steps-in-Supervised-Learning.ipynb) >"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
